Page MenuHomePhabricator

Develop a web app for patrolling based on the new ML-based service to predict reverts
Closed, ResolvedPublic

Description

IMPORTANT: Make sure to read the Outreachy participant instructions and communication guidelines thoroughly before commenting on this task. This space is for project-specific questions, so avoid asking questions about getting started, setting up Gerrit, etc. When in doubt, ask your question on Zulip first!

Brief summary

The Research team is working on improving the Machine Learning based tools to support Wikipedia Patrollers. As part of this effort we are developing a new model to detect revisions that require patrollers' attention (T314385).

The current model is based on implicit users’ feedback (revisions that have been reverted), and gives recommendations on which revisions are likely to be reverted. To improve this model and to make it easier to use its output, we want to build a web app that allows users to 1) rate the quality of the recommendation and 2) directly revert edits based on these recommendations.

Inspired by the SpeedPatrolling Tool, we want to build an app that allows users to give explicit feedback on our recommendations as well as allow them to directly revert revisions when needed. The app should be able to connect with our API to pull the recommendations and show them to the users. Also it should save the users feedback allowing to retrain/finetune our existing model.

Skills required

  • Intermediate JavaScript
  • HTML & CSS
  • Familiarity with Flask
  • Design/UX skills welcome but not required
  • Experience with Sklearn welcome but not required

Mentor(s)

@MunizaA
@diego

Microtasks

  • Make sure that you can login to the PAWS service with your wiki account: https://paws.wmflabs.org/paws/hub
  • Using this notebook as a starting point, create your own notebook (see these instructions for forking the notebook to start with) and complete the functions / analyses. All PAWS notebooks have the option of generating a public link, which can be shared back so that we can evaluate what you did. Use a mixture of code cells and markdown to document what you find and your thoughts.
  • As you have questions, feel free to add comments to this task (and please don't hesitate to answer other applicant's questions if you can help)
  • If you feel you have completed your notebook, you may request feedback and we will provide high-level feedback on what is good and what is missing. To do so, send an email to your mentor with the link to your public PAWS notebook. We will try to make time to give this feedback at least once to anyone who would like it.
  • When you feel you are happy with your notebook, you should include the public link in your final Outreachy project application as a recorded contribution. You may record contributions as you go as well to track progress.

Event Timeline

This looks good! @diego @MunizaA One of you feel free to upload a proposal on the Outreachy website and I'll then approve it.

diego changed the visibility from "Public (No Login Required)" to "diego (Diego S-T)".
diego changed the visibility from "diego (Diego S-T)" to "Public (No Login Required)".

Hi everyone, I am Andy, outreachy applicant, nice to meet you all,
Please i have a question, any recomendation on any video tutorial one can watch to getting started with mediawiki/mwapi api ?
I will be grateful.

hi @Drew21-mch, i did not use any video tutorial. i read extensively on it

Hi, nice to meet everyone here!

Looking forward to starting on the Microtasks and contributing to this project!

Hi everyone, I am Andy, outreachy applicant, nice to meet you all,
Please i have a question, any recomendation on any video tutorial one can watch to getting started with mediawiki/mwapi api ?
I will be grateful.

Hi all. I don't have one specific video that I can point to, but if you search on the Web you will find several of them.
The Developers Portal could be also a good source of information.

Hello everyone, I am Oyindamola Olatunji, an outreachy applicant. I look forward to contributing to Wikimedia.

HI everyone. I am very interested in working on this project as it builds upon my long term goal towards developing and engineering Artificial intelligence systems. Please feel to ask me any question on the public chat and I will do my best to help. I am looking forward to a successful collaboration with the mentors and the entire team. Great to be onboard Open source!

Adding myself as a subscriber here. Hi everyone I'm the Product Manager for the Android app, we are going to be working on patrolling features in the Android app in the coming months, perhaps there is some alignment. I'll keep an eye out here and ping @srishakatux

Hi everyone, I'm Marian, outreachy applicant, nice to meet you.
I'm very happy and interesting in working this project. Actually, I don't have any experience in machine learning but I'm interesting and really want to learn and work this area so I wish we will help and share a lot each other.
Thanks.

I have a question about the Microtasks section, I don't understand that section at all. is it must do that task? if yes, can please u explain to me briefly the notebook task.
Thanks.

hi @Marian2023, you first have to download the notebook -download the notebook as a txt file and convert it to a jupyter notebook extension, then upload it to your paws notebook.

I think we are to attempt all the tasks given. the tasks we are to attempt are in the notebook sections - recent edits, damaging edits, comparing edits and analysis. summary of the tasks - you'll be working with mediawiki APIs to answer specific questions in the notebook.

@diego I experienced some issues when comparing the revisions for all the 5000 recentchanges, their individual wikitext contents are quite large and running them through mwedittypes.SimpleEditTypes() return an empty output.

@diego I experienced some issues when comparing the revisions for all the 5000 recentchanges, their individual wikitext contents are quite large and running them through mwedittypes.SimpleEditTypes() return an empty output.

Hi, @Caseyy0000, can you share your code that returns an empty output? I have experimented with wikitext contents and was able to get the content difference. Happy to help you debug if I can reproduce the error. Thanks!

I have a question about the Microtasks section, I don't understand that section at all. is it must do that task? if yes, can please u explain to me briefly the notebook task.
Thanks.

Hi, @Marian2023! I think the overall purpose of the notebook is to help familiarizing us with various wiki APIs, so that we get a feel for the dataset we are going to work with, especially the various features that could potentially be used in the ML model to predict reverts. The best way forward in my opinion is to follow the step-by-step instructions in the notebook that @diego has provided and ask specific questions here when you encounter a problem.

I find this research paper ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia very helpful in terms of understanding the technical background of the ORES model as well as the wider implications in general.

My question, however, is on this specific internship task: "Develop a web app for patrolling based on the new ML-based service to predict reverts". Are we expected to 1) only add features in a web app, e.g. highlighting the certain features to make it easier to get patrollers' attention, etc. OR 2) also get involved with improving the next iteration of the ORES model, e.g., by suggesting new features, etc.?

Currently, I am trying to learn about the ORES model but couldn't locate where exactly to find the latest model v3. I am still in the process of going through the Ores code base. Would highly appreciate some hints / guidance if someone has already found it. Thanks in advance!

@diego I experienced some issues when comparing the revisions for all the 5000 recentchanges, their individual wikitext contents are quite large and running them through mwedittypes.SimpleEditTypes() return an empty output.

Hi @Caseyy0000. Please check the documentation for medittypes here. It could be that in some specific cases the library fails, but that should be very exceptional.

Hi, @Caseyy0000, can you share your code that returns an empty output? I have experimented with wikitext contents and was able to get the content difference. Happy to help you debug if I can reproduce the error. Thanks!

Thanks @Claire3z!

I'm trying to get the first 5000 edits but I see that the rclimit max is 500?

I'm trying to get the first 5000 edits but I see that the rclimit max is 500?

Hi @uzor13. In the the python-mwapi documentation, it has a continuation parameter that returns an iterable over a new request.

I'm trying to get the first 5000 edits but I see that the rclimit max is 500?

Hi @uzor13. In the the python-mwapi documentation, it has a continuation parameter that returns an iterable over a new request.

Thanks

Hello everybody,

Thanks for all your work. I hope that you're enjoying this process and that this project brings you closer to work with Wikipedia data.

We have received several e-mails asking for feedback. Due to time constraints, and also to make this process as fair as possible, we would be able to provide individual (and very general) feedback to each of you just once per applicant. Therefore, if you want feedback from @MunizaA and me, be aware that since now, you can send us your notebook just once, and we are going to do our best on our end to provide you feedback within this week (until Friday).

For the ones that have already written to us, please let us know (in the same e-mail thread) if the last notebook you have sent is the one that you want feedback from, or if not, please tell us to wait, and send your most updated one.

Considering the questions we have already received let me point out few comments:

  • Yes, you need to work with 5000 (no 500) edits. If you have doubts about how to collect this, check the conversations above or look into the MediaWiki (and python-mwapi) documentation.
  • The analysis part is what can make your notebook shine. Be creative. Show your analytical and visualization skills. Work on creating appealing and informative visualizations and drive your conclusions from the data.

I hope this helps!
Best

Hi @diego, should we mention a single contribution on the Outreachy website or mention our contribution in multiple parts?

Hi @diego, should we mention a single contribution on the Outreachy website or mention our contribution in multiple parts?

Hi @Sannan2252 , you should explain your main contribution and also the notebook should be self-explanatory.

Please remember to record your contributions on the Outreachy website! The deadline is (today) Nov 4th!

@Sheilakaruku Hello! I didn't see any other task related to your project, so I am assigning this to you. As it's been a few weeks since the internship started, I am asking all interns to share a few updates (in 3-4 sentences) on their project progress in a comment on the relevant Phabricator task. I'd encourage you to do the same. For other reminders, please see my message on Zulip. cc @diego @MunizaA

Hello @srishakatux since I started Iv managed to tackle the tasks on the Notebook related to the project to familiarize myself with the different APIs we'll be using. Iv also started indulging with Toolforge and getting familiar with it too. I have managed to do every task given to me by @diego and @MunizaA so far. Today evening we will be discussing the work flow of the project.

Hello @srishakatux Upon our last meeting we discussed Data mining with my mentors. I'm to read some documentation on the same and engage with the Wikimedia Eventstreams

Update

We are implementing the Recentchanges and Revisions API endpoints this week

@Sheilakaruku I'd appreciate your help in adding final project outcomes here: https://www.mediawiki.org/wiki/Outreachy/Past_projects#Round_25 and help resolve this task if you consider it as done. Thank you :) cc @diego @MunizaA

Closing this task as Outreachy-25 finished three months ago and this task is tagged with Outreachy-25 only. (If this task should remain open, feel free to reopen and add an active project tag so this task can be found on a corresponding workboard - thanks!)