Maniphest T190555

Proposal: Develop a "worklist" tool for campaigns and in-person editing events.
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Meghasharma213
	Mar 23 2018, 6:36 PM

Description

Profile Information

IRC nickname on Freenode: megha213
Github Profile: https://github.com/MeghaSharma21
LinkedIn Profile:https://www.linkedin.com/in/megha-sharma-81834012a/
Location (country or state): Chandigarh (UT), India
Typical working hours (include your timezone): 12:00 - 02:00 (UTC+5:30)

Synopsis

Short summary describing your project and how it will benefit Wikimedia projects

As we all know, Wikipedia is a community-powered project whose quality and quantity is largely dependent on the contributors. But sometimes people aren’t able to find the correct set of articles which need work and are of their interest. Also, since Wikipedia is so big and vast, newcomers usually get lost into it.
Hence, I’ve taken up this project of ‘Building a worklist tool for campaigns and in-person editing events’.
With the help of this tool, we’d be able to create, share and modify worklists which’ll facilitate collaboration on articles which need work. Also, this tool will enable people to work on articles which fall in their areas of interest. All in all, we’ll be able to encourage more contributions by providing an intermediate platform.

Wikipedia community will be benefited because of this project in the following ways:

By using this tool, people will be able to collaboratively work on articles that need contributions.
The worklists present on the tool can be used for campaigns, in person editing events or other similar activities.
Through this tool, people can look for articles that fall in their area of interest and contribute to them.
Also, this tool can provide a good starting point to the newcomers.

Possible Mentor(s)

@Surlycyborg, @Harej

Have you contacted your mentors already?

Yes

Deliverables

Describe the timeline of your work with deadlines and milestones, broken down week by week. Make sure to include time you are planning to allocate for investigation, coding, deploying, testing and documentation

During the internship, I’ll be following the Agile Model of SDLC. The whole tenure of internship will be broken down into sprints of 2 weeks each. Each sprint will consist of one full SDLC until and unless it’s an epic. Hence, continuous testing and documentation will be done.

For every task, there will be an Acceptance Criteria (AC). Only after it is satisfied, task will be considered complete.

For coding tasks AC will be : Proper UTs should be in place. Code review and beta testing should be done.

For design tasks AC will be : Design document should adhere to the requirements and should be approved by the mentors.

Usability Testing will be performed with representative users. After that only, the milestone will considered as complete.

Detailed Solution

The solution that I’ve designed has been explained in detail below:
Database schema for the tool has been framed as a microtask. Link for the same is : https://github.com/MeghaSharma21/WorklistTool-GSoC-2018/pull/1
I’ll be using Django for the backend and ReactJS for the front end.
The tool will have 5 web pages -

Main page : On this page, user will be provided with these options - 1. Create Worklist, 2. Open My Worklists and 3. Search Worklists. (1 and 2 will only be for logged-in users.)

Create Worklist page : This page will contain a form for creating the worklist. The user needs to provide the following details for creating a worklist - 1. Name, 2. Theme, 3. Description, 4. Option to add articles to the list (will be the same as "Add articles to existing worklists" except that it will be successful only if the worklist is created successfully) and 5. Option to add articles associated with a petscan query to this worklist, by providing petscan query ID. All of this info along with the user's username and creation_date will be stored in database, corresponding to the unique name and id of newly created worklist. Since the name for a worklist has to be unique, we'll not allow the user to enter a name which is already there in our database. This will be done while we'll be validating the information added by the user. The checking mechanism won't be just limited to string matching, it would also include things like - 1. new list's name differs from an existing list only by case, 2. new list's name is a plural/ singular form of the previous one etc. (I've thought of these only as of now, will update it once I get more ideas).

I’ve thought of having some pre-populated themes under which different lists can be categorized. This will make searching of lists easier. These themes can be decided in the requirements phase with the help of users.

Search lists page : In this, the lists will be populated in a tabular format where each row will contain the list’s name, no. of articles under it and no. of editors working on it. On this page, functionality to sort by name, date_created, date_updated, no. of articles and no. of editors will be provided. Also, user will be able to filter the lists by name and theme.

Open My Lists Page : It'll be similar to the 'Search Lists Page' except in this only user’s lists will be retrieved from the DB. Also, the user needs to be logged in to see this page. User will see the following -
- Worklists created by the user : Since we've created index over created_by attribute in the Worklist table, we'll be able to fetch all the worklists created by the user without scanning the whole Worklist table.
- Tasks created by the user : Since we've created index over created_by attribute in the Task table, we'll be able to fetch all the tasks created by the user without scanning the whole Task table.
- Tasks claimed by the user : Since we've created index over claimed_by attribute in the Task table, we'll be able to fetch all the tasks claimed by the user without scanning the whole Task table.

Lists page : We'll fetch all the tasks belonging to that worklist by querying all the tasks which have worklistId = <worklist-page-id>. This will be fast and not require scanning the whole table because <worklistId,articleId> is primary key. The tasks will be shown in tabular form. Every task will be a hyperlink leading to it's task page. Only attractive/repulsive metrics related to that task will be shown on this page. These can be discussed, but for the first iteration I'm thinking of having status of that task (So that user only investigates Open tasks), effort involved in that task (So that user only investigates tasks of appropriate difficulty) and average page views of article on the page. Status can be of 3 forms - Open (need to be worked upon), Claimed (is being worked upon) and Closed (has been worked upon). Effort can be - Low, Medium and High. So that people can judge how much effort do they need to put in to complete the task. On this page, functionality to sort by page views and date_updated will be provided. Also, user will be able to filter the lists by name, status and effort. On this page user will be able to do the following:

Add articles to existing worklists

An existing worklist will have it's entry in database already. The user can only add articles to the worklist and not remove (but can close that task after completion). There'll be an option to "Add article to worklist" on the worklist page uniquely identified by https://tools.wmflabs.org/worklist-tool/<worklist-name> . Upon clicking the "Add article to worklist" button, user will be required to provide the following details - 1. Article Name (we'll automatically find the Wikipedia article) 2. Description (of the problem he wants to be solved) 3. Effort (easy, medium, hard - depending upon what he thinks is the effort required to solve the problem). All of this info, along with user's username (created_by), worklist_id(the id of the worklist in which user is adding the article), status (will be open for a newly created article), progress (will be 0 for a newly created article), claimed_by (None for newly created article), date_created (creation-date) will be used to create a new task. Whenever a task will be created, the corresponding article field will be created or updated (in the case that article was involved in some other task) in Article table. The page views for an article will be calculated using pageviews API, projects and grade will be calculated using pageassessments API and size will be calculated from page table

Update petscan query articles to existing worklist

Only one petscan query can be associated with a worklist. The curator of the worklist can update the petscan query id associated with that worklist by clicking on the button "Update Petscan Query ID associated with this worklist" on the worklist page.

Sharing of worklist

Whenever a worklist is created, it will be alloted an ID when it is stored in the database. Also, a worklist can be uniquely identified by its name as well. Therefore, any worklist can be identified by this url, https://tools.wmflabs.org/worklist-tool/<worklist-name> . To share a worklist, one would either share the name of the worklist or the above URL.

Editing Worklist

In a worklist, an article once added can only be closed but not removed (so that after completion also, they remain a part of the worklist. It'll help newcomers to look upon the previous completed tasks and also help us to add the metrics & reporting feature later on). Also, description and theme of the worklist will be editable but not it's name (as we've the name is being used for sharing of worklists and as a primary key in the DB). The option to edit the description and theme will be provided on the worklist page itself.
Only the curator of the worklist will be able to edit the theme and the description.

Tasks page : We'll show the description of the task (as entered by creator of the task or those who claimed it) which will be editable, progress, effort, status [we'll fetch all these details from Task table]. The progress will be marked by the user who has claimed to work on the article. If article hasn't been claimed, it'll be either zero or equal to the value assigned by the last user who claimed it. Also, we'll show the properties of the article involved in this task, like name of the article, it's page_views, projects of which it is a part of, it's size in bytes and grades (like FA, A, GA etc). Also, we'll show in which all other tasks (and hence worklists) this article is involved in by querying all the tasks which have articleId = <article-page-id>. This will be fast and not require scanning the whole table because we've created index over articleId. On this page, user will be able to do the following:

Claim an article of a worklist which is unclaimed

The user can claim an article which was previously unclaimed and corresponding to that task's claimed_by field in database, his name will be entered and status will be changed to claimed. Because of the auto-refresh functionality, this info will be propagated to all the clients in real- time.

Update progress of a task -- only by claimed_by user

Only the claimed_by user can update progress for a task

Claimed_by user can change the status from

claimed to close -- when the user has completed the task. In that case, that task's progress will be changed to 100 and status will be changed to closed.
claimed to open -- when the user does not want to work upon the task anymore. At this time, user can change progress of the task, effort of the task and description of the task so that these stay up-to date and new user who'll claim this task does not have to start from scratch.

Update effort & description of a task

Only the claimed_by user can update effort for a task, in the case when he has changed it's status from claimed to open. And the initial effort estimation and description will be set by the user who has added the article to the list.

Note : Whenever Worklist is created or PetScan Query ID of a worklist is updated, we'll be fetching and storing the results (names of articles) of the PetScan query in the PSIDArticles table. This will improve the User Experience, as otherwise users would've had to wait till the results could be fetched from PetScan, every time they would see the worklists page. But these results will be stale. So in order to make the most recent results available to the user as soon as possible, once the page loads showing the results stored in database, we'll send ajax call to the server to update the results in database and also show the updated results to the user as the call returns.
Also, we've mentioned above that we'll fetch and store various properties of articles like it's average Page Views, Size, Projects, Grade. We'll update these values through a cron job, which will run after a specific interval of time, the length of which can be discussed.

Timeline:

Apr 24 - May 1

Community Bonding Period
Setting up environment on Toolforge
Setting up CI on Travis for Git repository
Understanding Wikimedia’s current tools that are of similar type. It’ll help me in identifying the areas of improvement
Studying similar existing tools that don’t belong to Wikimedia.

Note: As I’m already familiar with MediaWiki API, toolforge, git and Wikimedia’s database structure, I’d get a headstart.

May 7 - May 14

Though this will be a part of community bonding period, still I will start with the implementation part so that MVP can be completed by May end. (Before this, I'll be having end semester exams, hence I'll be able to start from 7th May only.)
Decide upon the database to be used and then finalize the database schema.
Write the DB calls
Write module for creating worklists (It includes the validation checks mentioned above in the detailed solution)

May 14 - May 28 (Week 1, 2)

The major focus during this sprint will be to finish the basic backend functionality with minimal UI, so that MVP release can be done on 30th May.
Integrate OAuth with the tool.
Write module for finding worklists.
Implement the feature of sharing of worklists.
Implement auto-refresh functionality which will make real-time updation of data possible.
Write module for importing Petscan queries.

May 30 : Do MVP Release and gather feedback from users

May 30 - June 10 (Week 3, 4)

Write module for describing worklists created and being worked upon by the user
Write module for calculating page views of the task
Write module for calculating grading and projects (to which it belongs) of the task
Write module for fetching the size of the task's article from the database

June 10 - June 24 (Week 5,6)

Adding autocomplete feature to finding worklists module
Implement filtering and sorting mechanism for lists and articles page
Preparing UI mockup

Note: This task will help us to gain clarity about the functionality that we need to implement.
(Acceptance Criteria: Design document should adhere to the requirements and should be approved by the mentors.)

June 24 - July 8 (Week 7,8)

In this sprint I’ll be working upon the stretch goals
Write module to draw lists of articles from other sources like CSV and .txt files
Write module for exporting information about the list for use in other tools

July 8 - July 22 (Week 9,10)

Implementation of complete UI part
In this sprint, optimizations like lazy load will also be included

July 22 - Aug 12 (Week 11,12)

Write UTs for the tool
Finish up pending reviews and documentation, if any.

After Internship
Even after internship, I plan to lookout for new and challenging opportunities being offered by Wikimedia community. I will continue to maintain and contribute to the development of this tool.

Milestones:

April:
- Gained familiarity with existing tools and formats
- Setup environment and CI pipeline for the tool

May:
- Finalized database schema
- Finished back-end implementation of modules for creating, searching, importing petscan queries and sharing worklists
- Integrated OAuth with the tool
- Added auto-refresh functionality
- Do MVP Release

June:
- Added modules for finding user’s lists and filtering and searching worklists
- Prepared design mockups
- Added functionality for fetching page views, projects, grading and size of the task's article
- Added autocomplete feature in finding worklists
- Written code for 2 stretch goals

July - August:
- Finished UI for the tool
- Done with final release of the product

Participation

Describe how you plan to communicate progress and ask for help, where you plan to publish your source code, etc

I’ll be maintaining the code base using Git. Hence, PRs will be used to gather feedback from the mentors on the same.

For sharing status and discussing issues, I'll be using Phabricator and Github issues respectively. Other than this, IRC and Gmail will act as other communication mediums. I’ll be active on IRC during my working hours.

Also, I’ll be participating in scrum meetings and sprint planning. These meetings will help me in knowing the community better. And to discuss tasks and issues, I’ll be having one-on-one once in every two weeks with my mentors.

I’ve planned to use meta wiki user page for quick weekly updates.

I’ve also thought to write a blog about my experiences and challenging work that I’ll be doing as a part of this project. I plan to write it once in every 2 weeks.

About Me

Tell us about a few:

Your education (completed or in progress)

I am a final year computer science undergraduate student at PEC University of Technology, Chandigarh, India. I am a Computer Science enthusiast, because it gives me the power to reach billions and transform their lives. I fell in love with mathematics and problem-solving in primary school and that love drives my programming. I enjoy Coding and Building new things.

For reference to my skill set and past projects,this is my resume. (Majorly, these include expertise in languages like Python, HTML/CSS, Javascript, Java; frameworks like Django, Spring and cloud computing platforms like AWS, Google Cloud.

How did you hear about this program?

I got to know about this from my college technical society.

Will you have any other time commitments, such as school work, another job, planned vacation, etc, during the duration of the program?

Till August 1 I will be having holidays, so I will be completely free. After that I will be joining my job in Amazon after graduation.

We advise all candidates eligible for Google Summer of Code and Outreachy to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?

What does making this project happen mean to you?

Wikipedia has always been an ultimate destination when it comes to making college assignments or reading about a topic or finding answers to some questions.

What google is to searching, Wikipedia is to knowledge. And Wikipedia is nothing without its community. If I’ll get an opportunity to make a difference to the community through my tool, the happiness that I’ll get out of it will be immeasurable.

Whatever I've done so far in my computer science career has only reached a maximum of 500 people. I want to build something which will impact millions, including my friend sitting next to me. And this project gives me that opportunity, and at the same time, challenges me to learn and grow. Hence, I would love to solve this hard and complex problem.

Past Experience

Describe any relevant projects that you've worked on previously and what knowledge you gained from working on them. Describe any open source projects you have contributed to as a user and contributor (include links). If you have already written a feature or bugfix for a Wikimedia technology such as MediaWiki, link to it here; we will give strong preference to candidates who have done so

Industry Experience:

Software Engineering Intern , Amazon, Bengaluru : Jan 2017 - July 2017

I worked with the tech team that supported the retail business of Amazon.

I built a visualization platform that quantified a team’s performance with the help of metrics. The visualizations provided an insight into the data which eventually helped in increasing a team’s productivity.

I used plethora of AWS technologies like QuickSight, Redshift, DynamoDB, Lambda etc for making this tool. Open source tools like Kibana were used to create visualizations.

I made webapp for the visualization platform using JSP, HTML/CSS, Javascript and Integrated it with my team's website.

Multiple reviews were held with senior leaders and clients and the platform improved with multiple iterations, which in turn improved my coding, testing and leadership skills.

I got pre-placement offer from Amazon for my exemplary work during the internship.

Researchshala - www.researchshala.com

Researchshala is an online platform to connect professors to research-interns, helping them with their research projects

I built this website in Django, hosting on AWS for the budding startup of my college

Research-Intern at IIT Delhi

I was among the top 0.33% students all over India who got selected for Student Research Fellowship Program, IIT-Delhi.

At IITD, I was a Research Fellow under Asst. Prof. Maya Ramanath.

I built a technical knowledge base to establish relationships between technical concepts by using Wikipedia as the input corpus and then organised relations into a hierarchical structure.

For this, I'd set up Hadoop for using PATTY, a MPI resource and maintained database using PostgreSQL and MongoDB.

Self-Projects:

Health Monitoring Application using Brain Computer Interface

I built an android app to analyse and improve user’s concentration and meditation levels.

The state of the user’s brain was monitored using Muse, an EEG device.

Binaural beats were used to improve the current levels of concentration and meditation.

Accessing and Extracting Information from the Hidden-Web

Objective was to build a database of publications and research papers of foreign academicians of Indian origin.

For this, I built a crawler that explored the Deep Web in a targeted manner after receiving a list of names of academicians as input.

Libraries like BeautifulSoup, Requests and JSoup were used for fetching and structuring the results.

This Project was funded by NSTMIS, a division of Department of Science and Technology, India.

Link: https://github.com/MeghaSharma21/AccessingDeepWeb

Intelligent Subtraction Tutor

I built a chatbot that is capable of teaching subtraction to a child by first judging its current knowledge level and then teaching it accordingly.

Facebook Messenger API was used to host the bot on Modulus, having Facebook messenger as the interface.

ai.wit was used to train the bot.

Link: https://github.com/MeghaSharma21/intelligent-subtraction-tutor

Password Based Door Lock

I built a password-based door-lock system, integrated with IR based motion sensor using 8051 microcontroller

The microcontroller was microprogrammed in C language.

Link: https://github.com/MeghaSharma21/Password-Based-Door-Lock

Open Source contributions:

Developed a user contribution summary tool named WikiCV for Wikipedia under Outreachy Round 15. This tool summarizes the user contributions in an automated manner and present them in a CV-like manner. It was developed with the intent of creating a powerful force through the tool - 1. to draw new editors to the project; 2. allow existing editors to spend more time on it and 3. get a professional edge by showing off their contributions. Link for the tool: https://tools.wmflabs.org/outreachy-wikicv/wiki-cv/Tgr/ , Link for the repository: https://github.com/MeghaSharma21/WikiCV , Link for sprint board: https://phabricator.wikimedia.org/tag/wikicv/

Developed a Sublime Package for C++ code snippets, which contained templates for algorithms commonly used in competitive programming. This was developed with the intent to help programmers solve questions with greater speed and accuracy. Link: https://github.com/MeghaSharma21/CPP_Competitive_Programming_Sublime_Snippets

Developed an Atom Package, which has been developed for competitive programmers on similar lines like the aforementioned sublime package. Link: https://github.com/MeghaSharma21/CPP_Competitive_Programming_Atom_Snippets

Note : Both the packages are available with their respective package managers, having 500+ installations.

Created a tool on toolforge which takes in the Wikipedia username and shows recent edits by the user. The results have been presented in an eye-catchy way, showing the timestamp, comment, title, article-size and diff-link of the edit. For this, I made an ajax call to the mediawiki database to fetch recent edits of a user and used Javascript and CSS to present the results. Link to Code: https://github.com/MeghaSharma21/outreachy-wikimedia-recent-user-edits-tool/tree/review , Link to Tool: https://tools.wmflabs.org/outreachy-recent-edits-tool/

Created a tool on toolforge to show the rank of a user based on no. of edits in Hindi Wikipedia. The rank and percentile of the user is presented along with the graph showing distribution of all the users according to number of edits which adds more clarity as to where the user stands. For this, I developed a Django app and the User table was taken as a data source. Link to Code: https://github.com/MeghaSharma21/outreachy-wikimedia-user-rank-tool/pull/1 , Link to Tool: https://tools.wmflabs.org/outreachy-user-ranking-tool

Created a tool on Toolforge that shows the contributions of a user in terms of no. of pages edited and created in the present year. The contributions have been shown in the form of a timeline similar to the Github’s contribution timeline. Link to tool: https://tools.wmflabs.org/outreachy-user-contribution-tool/ , Link to code: https://github.com/MeghaSharma21/outreachy-wikimedia-user-contribution-tool/pull/1

Micro-tasks Completed:

Microtask 1
Designed the database schema for the tool. It has been shown in the form of a class diagram, as in Django, database schema is declared as collection of classes in models.py file of the app.
Link: https://github.com/MeghaSharma21/WorklistTool-GSoC-2018/pull/1

Microtask 2
Developed a tool on Toolforge that takes the Petscan query ID as input, fetches the results from Petscan, parses them and presents in the form of cards.
Link for the code: https://github.com/MeghaSharma21/WorklistTool-GSoC-2018/pull/2
Link for the tool: https://tools.wmflabs.org/gsoc-petscan-query-articles/

Microtask 3
Developed a tool on Toolforge that is mainly a prototype/demo for creating and showing of worklists with minimal UI. Through this, I've achieved basic functionality and end-to-end connectivity. Whole of it will be reused for the main tool.
Link for the code: https://github.com/MeghaSharma21/WorklistTool-GSoC-2018/pull/3
Link for the tool: https://tools.wmflabs.org/worklist-tool

Any Other Info

Add any other relevant information such as UI mockups, references to related projects, a link to your proof of concept code, etc

Through my Outreachy Internship with Wikimedia, I've got pretty familiar with the Wikimedia community and tools. Also, the technology stack used in WikiCV is pretty much same as we would want to use for this tool. All this will give me a head start.

Related Objects

Mentioned In: T231891: Create a Generic List-building tool that can meet and exceed the applications of Pagepile
T187305: [#1Lib1Ref] Build a "worklist" tool for campaigns and in-person editing events.

Event Timeline

Meghasharma213 created this task.Mar 23 2018, 6:36 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 23 2018, 6:36 PM

Aklapper added a project: Google-Summer-of-Code (2018).Mar 23 2018, 7:04 PM

Meghasharma213 renamed this task from Build a "worklist" tool for campaigns and in-person editing events. to Proposal: Develop a "worklist" tool for campaigns and in-person editing events..Mar 23 2018, 7:04 PM

Meghasharma213 updated the task description. (Show Details)

Meghasharma213 updated the task description. (Show Details)Mar 23 2018, 7:30 PM

srishakatux moved this task from Backlog to Proposals In Progress on the Google-Summer-of-Code (2018) board.Mar 23 2018, 7:31 PM

Very nice, thank you! I'll make some inline comments below, but generally I like the extra features you've suggested, and I'd like to see a bit more detail on how you'd design and implement some of the things you've mentioned. Do let me know if you have any questions of course.

However, please bear in mind that we already have an initial set of requirements listed in the project description, mandatory and optional, so I think it's worth mentioning explicitly which ones you'd like to focus on in the proposal, and hopefully that can cut some of the time you've allocated to gathering requirements. These requirements came from @Sadads who will basically be our "customer" in this project and point of contact with end users in campaigns.

For coding tasks AC will be : Proper UTs should be in place. Code review and beta testing should be done.

We'll definitely be doing code reviews, but I wouldn't formally block on having many unit tests, though it's nice that you mention it. We'll be iterating a bit until we find out what we actually want the tool to look like, and I'd rather spend time building an MVP than on tests at first.

Design document should adhere to the requirements and should be approved by the mentors.

I wouldn't mind skipping the extra document if we mostly agree on what an MVP looks like while writing this proposal.

Usability Testing will be performed with representative users.

It will definitely be awesome if we get to use this tool in a campaign during development and gather feedback, and I know @Sadads has offered to help here, but this might not be entirely within our control to the point that we can block milestones on it. So as above, I'd suggest mentioning it in the proposal, but not necessarily making it a formal requirement in the development process.

I’ll be using Django for the backend and ReactJS for the front end.

I don't see ReactJS on your previous experience, do you have any experience with it? Just asking out of curiosity, I personally haven't used it but would be happy to learn, but I'd be interested in knowing your reasons for choosing it if you're familiar with it. (Wanting to learn it is a valid reason btw).

Create Worklist page (...)

Can you please expand on this a little -- in what ways is a user allowed to add articles to a worklist? Can they edit it later by adding/removing articles, or description and themes? If so, who can edit it and in what ways?

Open my lists & Search lists page

I like these, but note that, per the requirements in the project description, it is actually more important for a user to be able to create a worklist and easily pass around a link to it than to be able to explore and find random worklists created by other users, at least for the MVP. Please mention the sharing aspect of it, bonus points if it has some high-level technical details like some ideas for how to generate/share these links.

Lists page : On this page list of articles within a worklist will be listed in a tabular format. Each row will contain name of article, average page views, status and effort estimated to complete the task.

Why average page views and where would we get it from?

I like the idea of effort estimates. How are they set?

Also note that we'd like to see who has claimed which articles, per the requirements in the project description.

Also, there's a (near) real-time component to this page: we'd like status updates to propagate across the different users viewing a worklist. Can you please include some technical details for how this would be implemented?

Articles page

I'm not convinced by this one. This does not add a lot of information to what the article itself already provides, so I think we should just link to the article in the lists page for simplicity. I do like the idea of a progress indicator, which can also go in the lists page.

I've focused on the proposal itself rather than the timeline (I hope to have time to come back to it later though), but in general I'd say we should spend more time in the beginning putting together a MVP than refining documentation and requirements. That's because we already have a preliminary set of requirements and having a simple version of the tool will be a great way to tell what works and what doesn't, so we can iterate on that.

Also, I don't know the exact date, but I believe there will be a #1Lib1Ref South edition in May (https://blog.wikimedia.org/2018/03/22/building-a-better-1lib1ref/), so it would be extremely cool if we had a version of the tool for people to try then, even if it's a really basic one, just to see whether the idea makes sense.

Setting up *CI on Travis* for Git repository

This can likely be skipped or left to the very end. I'd suggest replacing with setting up OAuth, since that requires an approval process. I think we should think about OAuth pretty early rather than at the end as you mentioned.

Thanks again for this, and looking forward to see the next draft!

Meghasharma213 updated the task description. (Show Details)Mar 25 2018, 6:27 AM

I'd like to see a bit more detail on how you'd design and implement some of the things you've mentioned. Do let me know if you have any questions of course.

I've added more detail in the proposal itself. Kindly look through it. If it still lacks some detail, let me know.

However, please bear in mind that we already have an initial set of requirements listed in the project description, mandatory and optional, so I think it's worth mentioning explicitly which ones you'd like to focus on in the proposal, and hopefully that can cut some of the time you've allocated to gathering requirements. These requirements came from @Sadads who will basically be our "customer" in this project and point of contact with end users in campaigns.

Okay, I've moved up the timeline accordingly.

We'll definitely be doing code reviews, but I wouldn't formally block on having many unit tests, though it's nice that you mention it. We'll be iterating a bit until we find out what we actually want the tool to look like, and I'd rather spend time building an MVP than on tests at first.

Sure, I've moved tests to the last sprint.

I wouldn't mind skipping the extra document if we mostly agree on what an MVP looks like while writing this proposal.

Okay. We'll then finalize the UI mockup only and skip the design document part.

It will definitely be awesome if we get to use this tool in a campaign during development and gather feedback, and I know @Sadads has offered to help here, but this might not be entirely within our control to the point that we can block milestones on it. So as above, I'd suggest mentioning it in the proposal, but not necessarily making it a formal requirement in the development process.

In the new timeline, I'm targeting for a MVP release on May 30. It'll contain all the basic requirements implemented with minimal UI. It'll help us to gather early feedback and iterate over it.

I don't see ReactJS on your previous experience, do you have any experience with it? Just asking out of curiosity, I personally haven't used it but would be happy to learn, but I'd be interested in knowing your reasons for choosing it if you're familiar with it. (Wanting to learn it is a valid reason btw).

Yes, I'm not that experienced in ReactJS though I'm familiar with it. Still I want to use it for the project because of the reusable UI components and pagelets. Actually, in my last tool I had a hard time because of absence of pagelets. Since here also we'll be dealing with heavy computations and transfer of data from backend to frontend, I'd like to use ReactJS. And yes, I want to become better it too :). If I'll get selected, I'll learn it before the official coding period will start, so that it doesn't interfere with the proposed timeline.

Can you please expand on this a little -- in what ways is a user allowed to add articles to a worklist? Can they edit it later by adding/removing articles, or description and themes? If so, who can edit it and in what ways?

A user can add articles by either entering the title of the article or through a petscan query. And an article once added can only be closed but not removed (so that after completion also, they remain a part of the worklist. It'll help newcomers to look upon the previous completed tasks and also help us to add the metrics & reporting feature later on). Also, description and theme of the worklist will be editable but not it's name (as we've the name is being used for sharing of worklists and as a primary key in the DB). The option to edit the description and theme will be provided on the worklist page itself. I'm thinking of giving this access to update the description and theme of worklist only to the creator of worklist, but we can discuss if we also want to give this access to worklist contributors!

I like these, but note that, per the requirements in the project description, it is actually more important for a user to be able to create a worklist and easily pass around a link to it than to be able to explore and find random worklists created by other users, at least for the MVP. Please mention the sharing aspect of it, bonus points if it has some high-level technical details like some ideas for how to generate/share these links.

We can cover this post MVP. And the details of sharing feature have been added in the proposal.

Why average page views and where would we get it from?

Average page views per day will signify how famous the article is. By showing this information, the user will get an option to choose the famous articles and contribute to them. This will benefit him as it'll show up in his CV prepared by WikiCV tool.
We'll get the page views of past 60 days from this MediaWiki API: https://en.wikipedia.org/w/api.php?action=help&modules=query%2Bpageviews

I like the idea of effort estimates. How are they set?

Initially, they'll be set by the user who is adding the task to the worklist. After that, it can be modified by the user who had claimed the article but does not want to work on it anymore, thus he'll be allowed to change the task's effort when he'll change the status from "claimed" to "open".

Also note that we'd like to see who has claimed which articles, per the requirements in the project description.

Yes, it'll be shown on the respective tasks page. But the status of the task will be shown on the worklists page itself, so that user can investigate only the open tasks.

Also, there's a (near) real-time component to this page: we'd like status updates to propagate across the different users viewing a worklist. Can you please include some technical details for how this would be implemented?

I'm planning to implement it through the auto-refresh feature. Basically, after time period x, the call to fetch updated data from database will be made. After, the data is updated, specific components will be refreshed and new data will be displayed. Note that we won't refresh the whole page but only the components that'll be changing. This is where ReactJS will be very useful. It'll lead to better UX experience.

I'm not convinced by this one. This does not add a lot of information to what the article itself already provides, so I think we should just link to the article in the lists page for simplicity. I do like the idea of a progress indicator, which can also go in the lists page.

Actually, I'm displaying a lot of information about the article involved in the task, on the tasks page like it's projects, grades, status etc (mentioned in the proposal). I feel this information is important and moving it to the worklist page will create a clutter. So, I wanted to keep only very relevant information on the worklist page. But if you're not convinced, we can remove this part.

Also, I don't know the exact date, but I believe there will be a #1Lib1Ref South edition in May (https://blog.wikimedia.org/2018/03/22/building-a-better-1lib1ref/), so it would be extremely cool if we had a version of the tool for people to try then, even if it's a really basic one, just to see whether the idea makes sense.

Sure, I'll be targetting this!

This can likely be skipped or left to the very end. I'd suggest replacing with setting up OAuth, since that requires an approval process. I think we should think about OAuth pretty early rather than at the end as you mentioned.

Okay, I'll complete both of these in the start only. I'm already familiar with Travis so it won't take me much time to set it up. And I don't want to remove it because it makes the developer experience better.

Thanks a lot for your detailed feedback. I'm looking forward to more such discussions :).

Meghasharma213 updated the task description. (Show Details)Mar 25 2018, 7:42 AM

Meghasharma213 updated the task description. (Show Details)Mar 25 2018, 7:49 AM

Meghasharma213 updated the task description. (Show Details)Mar 25 2018, 8:20 AM

Meghasharma213 updated the task description. (Show Details)Mar 25 2018, 8:23 AM

Meghasharma213 updated the task description. (Show Details)

Cool, this looks great and is just about ready for final submission. I've made a few more comments below but none are blocking to the proposal -- I'm convinced you understand the problem and have a plan to solve it :)

Actually, I'm displaying a lot of information about the article involved in the task, on the tasks page like it's projects, grades, status etc (mentioned in the proposal).

Hmm, yeah, I guess I don't see all that information as very important to have in our tool, but I could be convinced otherwise. Let's leave it in the proposal so we can discuss with the other mentors too.

Since the name for a worklist has to be unique, we'll not allow the user to enter a name which is already there in our database.

It sounds like this wouldn't even allow different users to have worklists with the same name, which could get very frustrating.

Here's a couple of alternative ideas we can consider:

Making the IDs random strings rather than sequential numbers, so the user-entered name is not identifying at all and we don't need to worry about collisions. Interestingly, this also makes IDs harder to guess, so worklists are "pseudo-private" by default: you need to know a specific URL to find it. A nice UX touch would be to use IDs that are random but easy to read for humans, like gfycat.com does: https://gfycat.com/DizzyCourteousErne.
Add the creator to the ID and URL (e.g., "Surlycyborg/MyWorklist"). So now at least different users can collide on their worklists, and it's nice that you can tell who owns a worklist by just looking at its name.

We don't have to make a decision now necessarily, just thought I'd point out these possibilities.

The page views for an article will be calculated using pageviews API, projects and grade will be calculated using pageassessments API and size will be calculated from page table

These are fetched once when the article is created and put in our database, if I understand correctly. That's totally fine, but how/when would they be updated from then on?

The user can update the petscan query id associated with that worklist by clicking on the button "Update Petscan Query ID associated with this worklist" on the worklist page. We’ll be maintaining who added this Petscan ID so that this worklist can be show in ‘Open My Worklists’ for the user.

It probably makes sense to only allow the worklist creator to do this, in which case I think the second sentence doesn't apply? Also what happens when they change the PSID for a worklist -- do we replace the entire contents of the worklist with the result of the new query, or something else?

Personally I'd even be fine with not allowing this at all and having the user create a separate worklist if they change their mind on which PSID to use (and also delete the old one of course), but this is not a very strong opinion.

A related idea we've been talking about is re-running the query using the same PSID to get an updated list of articles, either triggered by the worklist creator or automatically. This is not a high priority feature but it might be worth thinking about how that could work.

For sharing status and discussing issues, I’ll be using Phabricator.

I think we could just use GitHub issues.

I’ve planned to use meta wiki user page to keep a record of my progress in a shareable format. I’ll be updating it every week.
I’ve also thought to write a blog about my experiences and challenging work that I’ll be doing as a part of this project. I plan to write it once in every 2 weeks.

We probably wouldn't need both of these. Suggestion: quick updates (just a few bullet points really) weekly in a Meta page, and blog posts after major milestones like having a MVP, seeing it used in a campaign etc.

Thanks!

Cool, this looks great and is just about ready for final submission. I've made a few more comments below but none are blocking to the proposal -- I'm convinced you understand the problem and have a plan to solve it :)

Thanks a lot! :)

It sounds like this wouldn't even allow different users to have worklists with the same name, which could get very frustrating.

My idea was that if the list with same name already exists, then why create a new one? Why can't people add the articles to the existing list only? But still, we can discuss it before implementation. For now, should I let it remain as such in the proposal?

It probably makes sense to only allow the worklist creator to do this, in which case I think the second sentence doesn't apply? Also what happens when they change the PSID for a worklist -- do we replace the entire contents of the worklist with the result of the new query, or something else?

I wasn't thinking of limiting the update of PSID to the creator only. Because what if the creator is no longer an active member? Or someone finds out a better petscan query? Again it can be discussed. For now, I'm limiting it only to the creator.
Addressing the second part of the question, since we aren't storing the results of the petscan query, the results would change with the change in PSID.

A related idea we've been talking about is re-running the query using the same PSID to get an updated list of articles, either triggered by the worklist creator or automatically. This is not a high priority feature but it might be worth thinking about how that could work.

I had thought of automatic refresh for that. But I need to still ponder about it's implementation details. Will get back to you by tonight.

I think we could just use GitHub issues.

Sure.

We probably wouldn't need both of these. Suggestion: quick updates (just a few bullet points really) weekly in a Meta page, and blog posts after major milestones like having a MVP, seeing it used in a campaign etc.

Works for me,

Thanks.

Meghasharma213 updated the task description. (Show Details)Mar 26 2018, 4:14 AM

My idea was that if the list with same name already exists, then why create a new one? Why can't people add the articles to the existing list only? But still, we can discuss it before implementation. For now, should I let it remain as such in the proposal?

Sure, we don't need a decision on this right now, let's get the other mentors to weigh in as well.

I guess the more we restrict actions to the creator only, the more it would make sense to have separate worklists. If we both want to use a name ("Unverified articles"), but don't agree on the exact PetScan query to use, and neither can change the other's list, might as well have separate lists.

Another example to consider is http://etherpad.wikimedia.org/, which interestingly supports both creating/opening a pad by name (presumably for ease of sharing) and creating one with a random ID.

Addressing the second part of the question, since we aren't storing the results of the petscan query, the results would change with the change in PSID.

Oh wait, we do store the results, right? Not everything that comes back from PetScan, but the query returns articles which we'd be putting in our database so we can track their statuses etc. We probably also want to do that because of how slow PetScan can be -- we don't want people waiting 1min to open a worklist while we run a query in the server.

So yeah, if we want autorefresh and to allow users to change the PSID of a list, we'll need to figure out how to merge the new results with the old, or whether the old are just replaced entirely, or what else happens. Again, my understanding is that these are not high priority features, but if you have thoughts on them, do include in the proposal please.

Meghasharma213 updated the task description. (Show Details)Mar 27 2018, 12:06 PM

Meghasharma213 updated the task description. (Show Details)Mar 27 2018, 1:00 PM

Oh wait, we do store the results, right? Not everything that comes back from PetScan, but the query returns articles which we'd be putting in our database so we can track their statuses etc. We probably also want to do that because of how slow PetScan can be -- we don't want people waiting 1min to open a worklist while we run a query in the server.

Ah yes, we're storing the results. I went a bit wrong in the explanation. The process of updating the results has been added to the proposal.

So yeah, if we want autorefresh and to allow users to change the PSID of a list, we'll need to figure out how to merge the new results with the old, or whether the old are just replaced entirely, or what else happens. Again, my understanding is that these are not high priority features, but if you have thoughts on them, do include in the proposal please.

It makes sense to replace it only. Because if the user wanted to merge them, he/she could have done it in Petscan itself. And as per the updated database design, it won't be difficult to replace the contents.

Jatin0312 moved this task from Proposals In Progress to Proposals Submitted on the Google-Summer-of-Code (2018) board.Apr 2 2018, 1:09 PM

Jatin0312 moved this task from Proposals Submitted to Proposals In Progress on the Google-Summer-of-Code (2018) board.Apr 2 2018, 1:22 PM

srishakatux moved this task from Proposals In Progress to Accepted Proposals on the Google-Summer-of-Code (2018) board.Apr 24 2018, 5:12 AM

Ragesoss subscribed.May 1 2018, 7:30 PM

srishakatux mentioned this in T187305: [#1Lib1Ref] Build a "worklist" tool for campaigns and in-person editing events..Sep 6 2018, 1:37 AM

Surlycyborg closed this task as Resolved.Sep 6 2018, 8:24 AM

Astinson mentioned this in T231891: Create a Generic List-building tool that can meet and exceed the applications of Pagepile.Sep 3 2019, 2:45 PM