Page MenuHomePhabricator

Proposal: [#1Lib1Ref] Build a "worklist" tool for campaigns and in-person editing events.
Closed, DeclinedPublic

Description

Profile

Name: Abel Lifaefi Mbula
IRC nick: bam (on Freenode)
Github: https://github.com/Bam92
Location (country or state): DRC
Typical working hours: 9am - 1pm and 6 - 10pm - UTC + 2

Abstract

In various Wikipedia outreach campaigns and events (editathons), participants often want to collaborate on improving articles that share a common theme or perhaps exhibit a common problem, such as being too short or lacking sufficient citations.

Events' organizers manually develop their on-wiki based worklist which:

  • can be challenging to manage for new users, or
  • require using tools not designed for tracking completion

The project idea aims at:
Developing a new tool to facilitate collaboration on worklists of articles that could be used for campaigns. This tool will be able to:

  • Define new worklists of articles (examples: “Articles with no references”, “Articles about Colombia”, …)
  • Worklists can be imported from PetScan queries (http://petscan.wmflabs.org/)
  • Share worklists with other users using a link
  • Manually list the articles inside a worklist, claim or assign one to work on and mark it completed when done
  • View which articles are claimed by which other users and whether they're being worked on or already done, in real time

Mentor: @Surlycyborg (python and toolforge experience)
Co-mentors: @Harej (backup), @Sadads (Wikipedia Library/#1Lib1Ref point of contact)

Persona

Junias Mbele, 27 years old, software dev and free culture enthusiast
Junias is a free/open source software dev. He is pationnate about free culture community. On his spare time he usually organize Wikipedia editing events with colleges and students. He manually prepares his worklist or rarelly uses PetScan tool. He is not unfortunately satisfied totally when it comes to track users contributions or manage worklists. Because things are done manually, there's very often editing conflit because 2 or more users work on the same article at the same time.

To track users contributions he creates a hashtag and uses hashtags tool. Unfortunately most of the time users forget to use hashtag; so some articles are not tracked at all.

Merveille Kalenga, 20 years old, student in medicine
Merveille is studing medicine. She uses Wikipedia as a go-to source for resolving arguments with friends and even with professors as, e.g. the history of medecine. She sometimes finds lack of info on Wikipedia or reliable citation; so she participates in Wikipedia editing events to help improve the encyclopedia. One thing she hates in these events is that there's still editing conflict between other users and her.

Timeline

Before April 23:

  • To familiarize myself to toolforge, the hosting service for my tool
  • Study MediaWiki API
  • To search and read all necessary documentations

April 23 - May 14 (Before the official coding time):

  • To get ready with specs (functional and tech)
  • I will remain in constant touch with mentors and some events' organizers so that I can deeply understand their real need in order to build a tool that meet their expectation
  • To design the database architecture and refine specs

July 25 – July 31:

  • For Documentation

A buffer of two weeks has been kept for any unpredictable delay.

Deliverables

  • Specs
  • Creating Tasks in Phabricator.
  • Getting Tools Lab access.
  • Creation of the UI templates.
  • Design the FLASK app frontend components.
  • Design the FLASK app backend components.
  • Providing OAuth authentication with Wikipedia.

Participation

Describe how you plan to communicate progress and ask for help, where you plan to publish your source code, etc

About Me

Tell us about a few:

  • Your education (completed or in progress)
  • How did you hear about this program?
  • Will you have any other time commitments, such as school work, another job, planned vacation, etc, during the duration of the program?
  • We advise all candidates eligible for Google Summer of Code and Outreachy to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?
  • What does making this project happen mean to you?

Past Experience

Describe any relevant projects that you've worked on previously and what knowledge you gained from working on them. Describe any open source projects you have contributed to as a user and contributor (include links). If you have already written a feature or bugfix for a Wikimedia technology such as MediaWiki, link to it here; we will give strong preference to candidates who have done so

Any Other Info

Add any other relevant information such as UI mockups, references to related projects, a link to your proof of concept code, etc

Event Timeline

BamLifa renamed this task from [#1Lib1Ref] Build a "worklist" tool for campaigns and in-person editing events. to Proposal: [#1Lib1Ref] Build a "worklist" tool for campaigns and in-person editing events..Mar 6 2018, 7:40 AM
BamLifa updated the task description. (Show Details)

Hey, thank you for this! By the way, I'd left some comments on https://semestriel.framapad.org/p/functional-spec-gsoc18 which you could address here.

In this draft, I can see you understand the problems and the features we'll need very well, but in general I think I'd like to see some more technical descriptions of how this would be implemented.

Here's a few questions you could answer, no need to go into a lot of detail:

  • What information about a worklist would you keep in the database?
  • You mentioned Flask on the server side, are you planning on using any JavaScript libraries or frameworks for the client side? You don't *need* to, but if you are, it's worth mentioning here.
  • When a users visit the page for a worklist, what information will they see about each article?
  • The results of a PetScan query can change over time as articles change categories, for example. Would we re-run the query to refresh the worklist? How?
  • This app will need some real-time functionality: all users viewing a worklist should be able to see updates to the information of an article in more-or-less real time. Any ideas how we could make that work?

What information about a worklist would you keep in the database?

I'll create more than one table for a worklist. In general these are info I'll need:

  • id of the worklist
  • creator username
  • date of creation
  • PSID
  • Language of the query
  • all generated articles
  • the link of the worklist
  • all users who claim articles
  • status of articles (not yet stated (0 %), on going (50 - 80 %), done (100 %))

You mentioned Flask on the server side, are you planning on using any JavaScript libraries or frameworks for the client side? You don't *need* to, but if you are, it's worth mentioning here.

I have not yet decided about any specific JS framework or library. jQuery the only one I have in mind

When a users visit the page for a worklist, what information will they see about each article?

List of articles in side that worklist + users who claim articles and the status of completion of articles. These info will be in a kind of a table.

The results of a PetScan query can change over time as articles change categories, for example. Would we re-run the query to refresh the worklist? How?

I was also thinking about this eventuality. But there something to consider before deciding: I plan to allow the creator of the worklist to manually add or delete articles. So, if we re run the PS query we will be obliged to overwrite articles stores in the db; thus we will lose manual info added in the db. To evoid this to happy I decide, for the first release of the app, not to activate this button (refresh).

This app will need some real-time functionality: all users viewing a worklist should be able to see updates to the information of an article in more-or-less real time. Any ideas how we could make that work?

I have not yet a clear idea on how we will make this. What I think is to use AJAX technology.

Great, thanks for the answers! By the way, you're probably planning to do this already, but please move the results of these conversations we're having into the proposal itself at some point, just so it's complete and self-contained.

  • creator username
  • date of creation

I'm curious about these two. I do have some uses for them in mind but I'd like to hear what you're thinking. What would you use them for?

  • the link of the worklist

Do you have any thoughts on how we'd generate such a link?

  • all users who claim articles

Just FYI, there are some rules for using user information on Toolforge: https://wikitech.wikimedia.org/wiki/Wikitech:Labs_Terms_of_use#What_can_and_can’t_be_done_with_user_information?

No need to change the proposal based on that, just thought I'd mention so you're aware that we're supposed to keep user information to a minimum and we may be required to display a notice upon login (and maybe also a deletion policy for user information).

I have not yet decided about any specific JS framework or library. jQuery the only one I have in mind

No problem, we could consider something like React depending on the complexity of the client side, but plain JQuery is totally fine.

The results of a PetScan query can change over time as articles change categories, for example. Would we re-run the query to refresh the worklist? How?

I was also thinking about this eventuality. But there something to consider before deciding: I plan to allow the creator of the worklist to manually add or delete articles. So, if we re run the PS query we will be obliged to overwrite articles stores in the db; thus we will lose manual info added in the db. To evoid this to happy I decide, for the first release of the app, not to activate this button (refresh).

We don't really have to overwrite _all_ articles when we update the worklist right? We could overwrite only the ones that came from PetScan to begin with.

Also, you mentioned a button, so I'm assuming it would be users who would trigger a worklist refresh. That's fine, but we might also want to do something automated every once in a while.

Many thanks for your comments and suggestions that seem very intersting.

I thing that most of info are already on my proposal a part from the front end. When I say

To design the database architecture and refine specs

It is all about db architecture and implementation. Unless you require me to put more detail.

About your comments in general:

I'm curious about these two. I do have some uses for them in mind but I'd like to hear what you're thinking. What would you use them for?

A worklist is linked to an identified user (Wikipedia Aoth). The date attribute will help us to sort worklist. In short, these 2 attributes are so important as they will help us easily handle worklists.

Do you have any thoughts on how we'd generate such a link?

Not yet, but a little web search will help.

Just FYI, there are some rules for using user information on Toolforge: https://wikitech.wikimedia.org/wiki/Wikitech:Labs_Terms_of_use#What_can_and_can’t_be_done_with_user_information?

Thank you.

We don't really have to overwrite _all_ articles when we update the worklist right? We could overwrite only the ones that came from PetScan to begin with.

So, we have to create 2 or more tables: one for automatic manipulation and other for manual manip?

Many thanks for your comments and suggestions that seem very intersting.

I thing that most of info are already on my proposal a part from the front end. When I say

To design the database architecture and refine specs

It is all about db architecture and implementation. Unless you require me to put more detail.

Yeah, we'd definitely be making changes to the spec as we go, but I think it would be nice to see more detail in the proposal to show that we have a good idea of how this would work in technical terms, and also because we'd be discussing implementation details all summer, so it's a good thing that we do it now a little as well.

I'm not looking for really detailed stuff like database schema definitions or a formal definition of an API for the server, but it helps to see what information is stored, and how it's used/modified at a high level given the features we're considering.

For example: "For each worklist, we store in a database its creation time, corresponding PetScan query, the username of its creator and the corresponding articles. The articles may be either manually added or expanded from the PetScan query, but only the creator of the worklist can do that. (...)"

So from that example paragraph I can tell why we want the username of the creator -- we'd be restricting some features so only the creator can use them.

Does this make sense? Let me know if I can help clarify this!

A worklist is linked to an identified user (Wikipedia Aoth). The date attribute will help us to sort worklist. In short, these 2 attributes are so important as they will help us easily handle worklists.

As I mentioned above, it would be good to see a little more detail on what features/functionalities depend on that information. Would we be just sorting worklists by date in the client-side UI only, or is there more to it (e.g., are you thinking of expiring old worklists for example)? For username -- yes, we can link a worklist to its creator, but why? What features depend on that?

We don't really have to overwrite _all_ articles when we update the worklist right? We could overwrite only the ones that came from PetScan to begin with.

So, we have to create 2 or more tables: one for automatic manipulation and other for manual manip?

Either that or, for each article, remember whether it was added manually or by the PetScan query, in the same table. The important thing is that we could be keeping that information about where the article came from, and using it to decide whether to keep it or not -- we can figure out the exact database layout later.